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Abstract 

We consider the maximum independent set problem on sparse graphs with maximum de¬ 
gree d. We show that the integrality gap of the Lovasz d-function based SDP is 

0(d/ log 3/2 d). 

This improves on the previous best result of 0(d/ logd), and almost matches the integrality gap 
of 0(d/ log 2 d) recently shown for stronger SDPs, namely those obtained using poly log(d) levels 
of the SA + semidefinite hierarchy. The improvement comes from an improved Ramsey-theoretic 
bound on the independence number of K r -free graphs for large values of r. 

We also show how to obtain an algorithmic version of the above-mentioned SA + -based 
integrality gap result, via a coloring algorithm of Johansson. The resulting approximation 
guarantee of 0(d/ log 2 d) matches the best unique-games-based hardness result up to lower- 
order poly(log log d) factors. 
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1 Introduction 


Given a graph G = ( V , E), an independent set is a subset of vertices S' such that no two vertices in 
S are adjacent. The maximum independent set problem is one of the most well-studied problems 
in algorithms and graph theory, and its study has led to various remarkable developments such as 
the seminal result of Lovasz [Lov79] in which he introduced the -d-function based on semidefinite 
programming, as well as several surprising results in Ramsey theory and extremal combinatorics. 

In general graphs, the problem is notoriously hard to approximate. Given a graph on n vertices, the 
best known algorithm is due to Feige [Fei04], and achieves an approximation ratio of 0{n/ log 3 n); 
here O(-) suppresses some log log n factors. On the hardness side, a result of Hastad [Has96] shows 
that no n 1_£ approximation exists for any constant e > 0, assuming NP $7 ZPP. The hardness has 
been improved more recently to n/ exp((log n) 3//4+£ ) by Khot and Ponnuswami [KP06]. 

In this paper, we focus on the case of bounded-degree graphs, with maximum degree d. Recall 
that the naive algorithm (that repeatedly picks an arbitrary vertex v and deletes its neighbor¬ 
hood) produces an independent set of size at least n/(d + 1), and hence is a d + 1-approximation. 
The first o(d)-approximation was obtained by Halldorsson and Radhakrishnan [HR94], who gave 
a 0(d/ log log d) guarantee, based on a Ramsey theoretic result of Ajtai et al. [AEKS81]. Subse¬ 
quently, an 0(d ^ -approximation was obtained independently by several researchers [AK98, 
Hal02, HalOO] using the ideas of Karger, Motwani and Sudan [KMS98] to round the natural SDP 
for the problem, which was itself based on the Lovasz $-function. 

On the negative side, Austrin, Khot and Safra [AKS11] showed an Q(d/ log 2 d) hardness of approx¬ 
imation, assuming the Unique Games Conjecture. Assuming P ^ NP, a hardness of d/log 4 d was 
recently shown by Chan [Chal3]. We remark that these hardness results only seem to hold when d 
is a constant or a very mildly increasing function of n. In fact for d = n, the Q(d/ log 2 d) hardness 
of [AKS11] is inconsistent with the known 0{n/ log 3 n) approximation [Fei04], Hence throughout 
this paper, it will be convenient to view d as being a sufficiently large but fixed constant. 

Roughly speaking, the gap between the Q(d/ log 2 d)-hardness and the 0(d/ log ^-approximation 
arises for the following fundamental reason. Approaches based on the SDP work extremely well if 
the •&-function has value more than 0{n/ log d), but not below this threshold. In order to to show an 
D(d/ log d)-hardness result, at the very least, one needs an instance with SDP value around n/ log d, 
but optimum integral value about n/d. While graphs with the latter property clearly exist (e.g., 
a graph consisting of n/{d + 1) disjoint cliques FQ+i), the SDP value for such graphs seems to be 
low. In particular, having a large SDP value imposes various constraints on the graph (for example, 
they cannot contain many large cliques) which might allow the optimum to be non-trivially larger 
than n/d, for example due to Ramsey-theoretic reasons. 

Recently, Bansal [Banl5] leveraged some of these ideas to improve the approximation guarantee by a 
modest 0(log log d) factor to d/ log d using polylog(d) levels of the 5A + hierarchy. His improvement 
was based on combining properties of the SA + hierarchies together with the ideas of [AEKS81]. 
He also showed that the 0(log 4 d)-level S'A + relaxation has an integrality gap of 0{d/ log 2 d), 
where O(-) suppresses some log log d factors. The main observation was that as the <S’A + relaxation 
specifies a local distribution on independent sets, and if the relaxation has high objective value then 
it must be that any polylog(d) size subset of vertices X must contain a large independent subset. 
One can then use a result of Alon [Alo96], in turn based on an elegant entropy-based approach 
of Shearer [She95], to show that such graphs have non-trivially large independents sets. However, 
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this argument is non-algorithmic; it shows that the lifted SDP has a small integrality gap, but does 
not give a corresponding approximation algorithm with running time sub-exponential in n. This 
leads to the question whether this approach can be converted into an approximation algorithm that 
outputs a set of size D(log 2 d/d) times the optimal independent set, or if there is a gap between the 
approximability and estimability of this problem (as recently shown for an NP problem by Feige 
and Jozeph [FJ14]). 

1.1 Our Results. 

Our results resolve some of these questions. For our first result, we consider the standard SDP 
relaxation for independent set (without applying any lift-and-project steps) and show that it is sur¬ 
prisingly more powerful than the guarantee given by Alon and Kahale [AK98] and Halperin [Hal02]. 

Theorem 1.1 On graphs with maximum degree d, the standard d-function-based SDP formulation 
for the independent set problem has an integrality gap of 0(d/log 3 ^ 2 d). 1 


The proof of Theorem 1.1 is non-constructive; while it shows that the SDP value is within the 
claimed factor of the optimal IS size, it does not give an efficient algorithm to find such an approx¬ 
imate solution. Finding such an algorithm remains an open question. 

The main technical ingredient behind Theorem 1.1 is the following new Ramsey-type result about 
the existence of large independent sets in iF r -free graphs. This builds on a long line of previous 
results in Ramsey theory (some of which we discuss in Section 2), and is of independent interest. 
(Recall that a(G) is the maximum independent set size in G.) 


Theorem 1.2 For any r > 0, if G is a K r -free graph with maximum degree d then 


a(G) = fi 


( n I log d 

d ' maX y r log log d ’ 



(1) 


Previously, the best known bound for AT r -free graphs was D(^ - d ) given by Shearer [She95]. 

Observe the dependence on r: when r > lo j)^ d , i.e., when we are only guaranteed to exclude very 
large cliques, this result does not give anything better than the trivial n/d bound. It is in this 
range of r > logd that the second term in the maximization in (1) starts to perform better and 
give a non-trivial improvement. In particular, if G does not contain cliques of size r = 0(log 3 ^ 2 d) 
(which will be the interesting case for Theorem 1.1), Theorem 1.2 gives a bound of D(y (logd) 1 / 2 ). 
Even for substantially larger values such as r = exp(log 1-2 " d), this gives a non-trivial bound of 

6(3 log £ d). 


Improving on Shearer’s bound has been a long-standing open problem in the area, and it is con¬ 
ceivable that the right answer for AT r -free graphs of maximum degree d is a(G) > ^j^y- This 
would be best possible, since in Section 3.1 we give a simple construction showing a upper bound 
of a(G) = 0(3 j^y) f° r r > logd, which to the best of our knowledge is the smallest upper bound 
currently known. The gap between our lower bound and this upper bound remains an intriguing 


1 Hcre and subsequently, O(-) suppresses poly(log log d) factors. 
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one to close; in fact it follows from our proof of Theorem 1.1 that such a lower bound would im¬ 
ply an 0(d/ log 2 d) integrality gap for the standard SDP. Alon [Alo96] shows that this bound is 
achievable under the stronger condition that the neighborhood of each vertex is (r — l)-colorable. 

We then turn to the approximation question. Our third result shows how to make Bansal’s result 
algorithmic, thereby resolving the approximability of the problem (up to lower order poly(log log d) 
factors), at least for moderate values of d. 

Theorem 1.3 There is an 0(d/ log 2 d)-approximation algorithm with running time 2 , poly(n) ■ 
2°( d \ based on rounding a d-level Sb4 + semidefinite relaxation. 

The improvement is simple, and is based on bringing the right tool to bear on the problem. As 
in [BanlS], the starting point is the observation that if the d-level SA + relaxation has objective 
value at least n/s (and for Theorem 1.3 the value s = log 2 d suffices), then the neighborhood of every 
vertex in the graph is fc-colorable for k = s ■ polylog(d) — they are “locally colorable”. By Alon’s 
result mentioned above, such graphs have a{G) = fl(^ |2|^). However, instead of using [Alo96] 
which relies on Shearer’s entropy based approach, and is not known to be constructive, we use an 
ingenious and remarkable (and stronger) result of Johansson [Joh96b], who shows that the list- 
chromatic number of such locally-colorable graphs is xe(G) = 0(d{^|^). His result is based on a 
very clever application of the Rodl “nibble” method, together with Lovasz Local Lemma to tightly 
control the various parameters of the process at every vertex in the graph. Applying Johansson’s 
result to our problem gives us the desired algorithm. 

Unfortunately, Johansson’s preprint (back from 1996) was never published, and cannot be found 
on the Internet. 3 For completeness (and to facilitate verification), we give the proof in its entirety 
in the Appendix. We essentially follow his presentation, but streamline some arguments based on 
recent developments such as concentration bounds for low-degree polynomials of random variables, 
and the algorithmic version of Local Lemma. His manuscript contains many other results that build 
upon and make substantial progress on a long line of work (we give more details in Section 2). We 
hope that this will make Johansson’s ideas and results accessible to a wider audience. (Johansson’s 
previous preprint [Joh96a] showing the analogous list-coloring result for triangle-free graphs is also 
unavailable publicly, but is presented in the graph coloring book by Molloy and Reed [MR02], and 
has received considerable attention since, both in the math [AKS99, Vu02, FM13] and computer 
science communities [GPOO, CPS14].) 

The proof of Theorem 1.3 also implies the following new results about the LP-based Sherali-Adams 
(SA) hierarchies, without any SDP constraints. 

Corollary 1.4 The LP relaxation with clique constraints on sets of size up to logd (and hence the 
relaxation SA^ ogd j) has an integrality gap of 0(d/ log d). Moreover, the relaxation SAr d \ can be 
used to find an independent set achieving an 0(d/ log d) approximation in time poly(n) • 2°( d \ 

Since LP-based relaxations have traditionally been found to be very weak for the independent 
set problem, it may be somewhat surprising that a few rounds of the SA-hierarchy improves the 
integrality gap by a non-trivial amount. 

2 While a d-level SA + relaxation has size n°^ d ) in general, our relaxation only uses variables corresponding to 
subsets of vertices that lie in the neighborhood of some vertex v, and thus has n • 2°^ variables 

3 We thank Alan Frieze for sharing a copy with us. 
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All our results extend to the case when d is the average degree of the graph; by first deleting the 
(at most n/2) vertices with degree more than 2d and then applying the results. 

2 Preliminaries 

Given the input graph G = (V. E ), we will denote the vertex set V by [n] = {1,... , n}. Let a{G ) 
denote the size of a maximum independent set in G, and d denote the maximum degree in G. The 
naive greedy algorithm implies a(G ) > n/(d + 1) for every G. As the greedy guarantee is tight 
in general (e.g., if the graph is a disjoint union of n/(d + 1) copies of the clique AT^+i), the trivial 
upper bound of a(G) < n cannot give an approximation better than d + 1 and hence stronger upper 
bounds are needed. A natural bound is the clique-cover number y(G), defined as the minimum 
number of vertex-disjoint cliques needed to cover V. As any independent set can contain at most 
one vertex from any clique, a(G) < x(G). 

Standard LP/ SDP Relaxations. In the standard LP relaxation for the independent set prob¬ 
lem, there is variable x, for each vertex i that is intended to be 1 if i lies in the independent set 
and 0 otherwise. The LP is the following: 

max Xj, s.t. Xi+Xj< 1 \/(i,j)£E, and Xj € [0,1] Vi € [n]. (2) 


Observe that this linear program is very weak, and cannot give an approximation better than 
(d + l)/2: even if the graph consists of n/(d + 1) copies of Kd+ 1 , the solution Xi = 1/2 for each i 
is a feasible one. 

In the standard SDP relaxation, there is a special unit vector vq (intended to indicate 1) and a 
vector Vi for each vertex i. The vector Vi is intended to be no if i lies in the independent set and 
be 0 otherwise. This gives the following relaxation: 

max Vj ■ vo, s.t. vo ■ vo = 1, vq ■ Vi = Vi ■ Vi Vi € [n], and n* • vj = 0 V(i, j) € E. (3) 

i 

Let Y denote the (n + 1) x (n + 1) Gram matrix with entries yij = Vi ■ Vj, for i, j € {0,..., n}. Then 
we have the equivalent relaxation 

max ypi , s.t. y 00 = 1, yoi = Va Vi € [n], y^ = 0 V(i, j) € E and Y y 0. (4) 

i 


The above SDP which is equivalent to the well-known id -function of Lovasz [Lau] (Lemma 3.4.4), 
satisfies a(G) < d(G) < x(G). The 0(d lo f^°^ d ) approximations due to [AK98, Hal02, HalOO] are 
all based on SDPs. 

We will use the following important result due to Halperin [Hal02] about the performance of the 
SDP. The form below differs slightly from the one in [Hal02] as he works with a { — 1,1} formulation. 
A proof for the form below can be found in [Banl5, Theorem 3.1]. 


Theorem 2.1 (Halperin [Hal02], Lemma 5.2) Let rj € [0, 4] 

collection of vectors Vi satisfying ||uj|| 2 > rj in the SDP solution, 
returns an independent set of size D 


be a parameter and let Z be the 
Then there is an algorithm that 


d 2r > 

dV In d 


z 
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Note that if r) = clog log d/ log d, then for c < 1/4 Theorem 2.1 does not return any non-trivial 
independent set. On the other hand, for c > 1/4 the size of the independent set returned rises 
exponentially fast with c. 

For more details on SDPs, and the Lovasz ^-function, we refer the reader to [GLS88, GM12], 


Lower Bounds on the Independence Number. As SDPs can handle cliques, looking at i?(G) 
naturally leads to Ramsey theoretic considerations. In particular, if i9(G) is small then the trivial 
n/(d + 1) solution already gives a good approximation. Otherwise, if ■d(G') is large, then this 
essentially means that there are no large cliques and one must argue that a large independent set 
exists (and can be found efficiently). 

For bounded degree graphs, a well-known result of this type is that a(G) = D(n^p) for triangle- 
free graphs [AKS80, She83] (i.e. if there are no cliques of size 3). A particularly elegant proof 
(based on an idea due to Shearer [She95]) is in [AS92], Moreover this bound is tight, and simple 
probabilistic constructions show that this bound cannot be improved even for graphs with large 
girth. 


For the case of AVfree graphs with r > 4, the situation is less clear. Ajtai et al. [AEKS81] showed 
that A r -free graphs have a(G) = D(n(log(log d/r))/d), which implies that a(G) = Q(n log log d/d) 
for r <C logd. This result was the basis of the 0(d/ log logd) approximation due to [HR94]. 
Shearer [She95] improved this result substantially and showed that a(G) > D(i2_l2SA_) for K r - 
free graphs. His result is based on an elegant entropy based approach that has subsequently found 
many applications. However, it is not known how to make this method algorithmic. Removing the 
log log d factor above is a major open question, even for r = 4. Also, note that his bound is trivial 
when r > -r/rfev 


— log log d- 


Interestingly, this result also implies another (non-algorithmic) proof that the SDP has integrality 
gap d 1 ° 1 g o 1 °|^ . In particular, if the SDP objective is about n/r this essentially implies that the graph 
is K r -free (as roughly each vertex contributes about Xj = 1/r). Thus, by Shearer’s bound the 
integrality gap is ( n/r)/a(G ) < d ]o f ( ^ d . It is interesting to note that both Halperin’s approach 
and Shearer’s bound seem to get stuck at the same point. 


Alon [Alo96] generalized the triangle-free result in a different direction, also using the entropy 
method. He considered locally fc-colorable graphs, where the neighborhood of every vertex is 
A;-colorable and showed that a(G) = D (n iogfc+i )• Note that triangle-free graphs are locally 1- 
colorable. This result also holds under weaker conditions, and plays a key role in the results of 
[Banl5] on bounding the integrality gap of S'A + relaxations. 


Bounds on the chromatic number. Most of the above results generalize to the much more 
demanding setting of list coloring. All of them are based on “nibble” method, but require in¬ 
creasingly sophisticated ideas. The intuition for why 0(d/ log d) arises can be seen via a coupon- 
collector argument: if each vertex in the neighborhood N(v ) chooses a color from s colors inde¬ 
pendently and u.a.r., they will use up all s colors unless d < O(slogs), or s > D(d/logd). (Of 
course, the colors at the neighbors are not chosen uniformly or independently.) Kim showed that 
Xe.{G) = 0(d/ log d) for graphs with girth at least 5 [Kim95]. His idea was that for any v, and 
u,w € N(v), N(u ) O N(w ) = {u} because of the girth, and hence the available colors at u, w evolve 
essentially independently, and hence conform to the intuition. 

These ideas fail for triangle-free graphs (of girth 4): we could have a vertex v , with u, w € N(v), 
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and N(u) = N(w) (i.e., all their neighbors are common). In this case the lists of available colors 
at u and w are far from independent: they would be completely identical. Johansson [Joh96a] had 
the crucial insight that this positive correlation is not a problem, since there is no edge between 
u and w (because of triangle-freeness!). His clever proof introduced the crucial notions of entropy 
and energy to capture and control the positive correlation along edges in such K^-free graphs. 

If there are triangles, say if the graphs are only locally fc-colorable, then using these ideas naively 
fails. A next key new idea, also introduced by Johansson [Joh96b], is to actually modify the 
standard nibble process by introducing a probability reshuffling step at each vertex depending on 
its local graph structure, which makes it more complicated. In Section 6, we give his result for 
locally-colorable graphs in its entirety. 

Lift-and-project Hierarchies. An excellent introduction to hierarchies and their algorithmic 
uses can be found in [CT12, Lau03]. Here, we only describe here the most basic facts that we need. 

The Sherali-Adams (SA) hierarchy defines a hierarchy of linear programs with increasingly tighter 
relaxations. At level t, there is a variable Ys for each subset S C [n] with |5| < t + 1. Intuitively, 
one views Ys as the probability that all the variables in S are set to 1. Such a solution can be 
viewed as specifying a local distribution over valid {0, l}-solutions for each set S of size at most 
t + 1. A formal description of the f-round Sherali-Adams LP SAu\ for the independent set problem 
can be found in [CT12, Lemma 1]. 

For our purposes, we will also impose the PSD constraint on the variables yij at the first level (i.e., 
we add the constraints in (4) on y t j variables). We will call this the t -level 5A + formulation and 
denote it by SA+ t y To keep the notation consistent with the LP (2), we will use Xi to denote the 
marginals ya on singleton vertices. 


3 Integrality Gap 

In this section, we show Theorem 1.1, that the integrality gap of the standard Lovasz -^-function 
based SDP relaxation is 

To show this we prove the following result (which is Theorem 1.2, restated): 


Theorem 3.1 Let G be a K r -free graph with maximum degree d. Then 


a(G) = n 


( n ( log d 
d™ y r log log d ’ 



In particular, for r = log c d with c > 1, we get a(G) = t^(^( c d ) 1 ^ 2 ) ■ 

We need the following basic facts. The first follows from a simple counting argument (see [Alo96, 
Lemma 2.2] for a proof). 

Lemma 3.2 Let F be a family of 2 ex distinct subsets of an x-element set X. Then the average 
size of a member of F is at least ex/(101og(l + 1/e)). 
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Fact 3.3 Let G be a K r -free graph on x vertices, then 


a(G) > max 


x 


1/r 


logx 


2 ’ log(2r) J 

Note that the latter bound is stronger when r is large, i.e., roughly when r > log x/ log log x. 


Proof: Let R(s, t ) denote the off-diagonal (s, i)-Ramsey number, defined as the smallest number n 
such that any graph on n vertices contains either an independent set of size s or a clique of size t. 

It is well known that R(s,t) < [ES35]. Approximating the binomial gives us the bounds 

R(s,t ) < (2s)* and R(s,t ) < (2 t) s ; the former is useful for t < s and the latter for s < t. If 
we set R(s,t) = x and t = r, the first bound gives s > (l/2)x 1 / r and the second bound gives 
s > logx/log(2r). ■ 

We will be interested in lower bounding the number of independent sets X in a K r - free graph. 
Clearly, X > 2“ ^ (consider every subset of maximum independent set). However the following 
improved estimate will play a key role in Theorem 3.1. Roughly speaking it says that if a(G) is 
small, in particular of size logarithmic in x, then the independent sets are spread all over G, and 
hence their number is close to x 


Theorem 3.4 Let G be a K r -free graph on x vertices, and letX denote the number of independent 
sets in G. Then we have 


logZ > max 


x l / r 

T - ’ 


log 2 X 
18 log 2 r 


Proof: The first bound follows trivially from Fact 3.3, and hence we focus on the second bound. 
Also, assume r > 3 and x > 64 else the second bound is trivial. 

Define s := log x/ log(2r). Let G' be the graph obtained by sampling each vertex of G independently 
with probability p : = 2/a; 1 / 2 . The expected number of vertices in G' is px = 2a; 1 / 2 . Let Q denote 
the good event that G' has at least a; 1 / 2 vertices. Clearly, Pr[Q] > 1/2 (in fact it is exponentially 
close to 1). Since the graph G' is also K r - free, conditioned on the event Q , it has an independent 
set of size at least log^ 1 / 2 )/log(2r) = s/2. Thus the expected number of independent sets of size 
s/2 in G' is at least 1/2. 

Now consider some independent set Y of size s/2 in G. The probability that Y survives in G' is 
exactly p s / 2 . As the expected number of independent sets of size s/2 in G' is at least 1/2, it follows 
that G must contain at least (l/2)(l/p s / 2 ) independent sets of s/2. This gives us that 

log X > - log ( — ) — 1 > — log X 1 / 2 -1 > — log x , 

& ~ 2 & \p J ~ 2 B 2 -18 

where the last inequality assumes that x is large enough. ■ 

We are now ready to prove Theorem 3.1. 

Proof: We can assume that d > 16, else the claim is trivial. Our arguments follow the probabilistic 
approach of [She95, Alo96]. Let IT be a random independent set of vertices in G , chosen uniformly 
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among all independent sets in G. For each vertex v, let X v be a random variable defined as 

X v = d\vnw\ + \N(v) nw\. 

Observe that \W\ can be written as )T/, |u n W |; moreover, it satisfies \W\ > (1/d) Y/ v |-N(u) O W |, 
since a vertex in W can be in at most d sets N(v). Hence we have that 

V 

Let 7 = max ( rd , (j^y) 1 ^) denote the improvement factor in Theorem 3.1 over the trivial 
bound of n/d. Thus to show that a(G) is large, it suffices to show that 

E[X„] > C 7 (5) 

for each vertex v and some fixed constant c. 

In fact, we show that (5) holds for every conditioning of the choice of the independent set in 
V — (N(v) U {u}). In particular, let H denote the subgraph of G induced on V — (N(v) U {u}). For 
each possible independent set S in H, we will show that 


E[X V I W n V(H) = S}> cry. 

Fix a choice of S. Let X denote the non-neighbors of S in N(v), and let x = |X|. Let e be such 
that 2 £X denotes the number of independent sets in the induced subgraph Gpf], Now, conditioning 
on the intersection W n V(H) = S, there are precisely 2 £X + 1 possibilities for W: one in which 
W = S U {u}, and 2 £X possibilities in which v and W is the union of S with an independent 
set in G[X], 

and thus we have 


By Lemma 3.2, the average size of an independent set in X is at least 
that 

FT 

E[X V I w n V (H) = S]> „ + 


10 log 1/e+l 
2 £X 


'2 £X + 1 ' 101 og(l/e + 1 ) 2 £X + 1 ^ 

Now, if 2 £X + 1 < Vd, then the first term is at least yfd, and we’ve shown (5) with room to spare. 
So we can assume that ex > (1/2) log d. Moreover, by Theorem 3.4, 

f x x ! r log 2 x 

ex > max I - 

and hence the right hand side in ( 6 ) is at least 

1 

max 


2 ’ 181og(2r) 

I log d x x ! r log 2 x 


401og(l/e + 1) 

1 


> 


401og(x + 1) 


max 



(7) 


where the inequality uses e>l/x (since ex > ( 1 / 2 ) log d > 1 ). 

First, let’s consider the first two expressions in (7). If x > log r d, then as x 1 ^/ log(x + 1) is 
increasing in x, it follows that the right hand side of (7) is at least 

x x ! r „ ( logd \ 


801og(x + 1 ) 


= n 


r log log d J 


















On the other hand if x < log r d, then we have that the right hand side is again at least 


1 


logd 


401og(x + l) 2 


= si 


logd \ 


r log log d J 


Now, consider the first and third expressions in in (7). Using the fact that max(a, b) > \/~ab with 

1 /2 

a = (logd)/2 and b = (log 2 x)/(18 log 2r), we get that (7) is at least 17 • Hence, for every 

value of x we get that (7) is at least 12 ( 7 ) as desired in (5); this completes the proof of Theorem 3.1. 


We can now show the main result of this section. 

Theorem 3.5 The standard SDP for independent set has an integrality gap of 


Proof: Given a graph G on n vertices, let fi € [0,1] be such that the SDP on G has objective 
value fin. If f) < 2/ log 3 / 2 d, the naive greedy algorithm already implies a d/ log 3 / 2 d approximation. 
Thus, we will assume that fi > 2/log 3 / 2 d. 

Let us delete all the vertices that contribute Xi < fi/2 to the objective. The residual graph has 
objective value at least fin — (fi/2)n = fin/2. 

Let t] = 2 log log d/ log d. If there are more than nj log 2 d vertices with x % > r /, applying Theorem 2.1 
to the collection of these vertices already gives independent set of size at least 


d 2ri to \ /nlog 3 / 2 d\ 

d\J In d log 2 d) yd) 


and hence a 0(d/ log 3 / 2 d) approximation. 

Thus we can assume that fewer than n/ log 2 d vertices have Xi > 77. As each vertex can contribute at 
most 1 to the objective, the SDP objective on the residual graph obtained by deleting the vertices 
with Xi > rj is at least fin/2 — n/ (log 2 d) which is at least fin/3, since fi > 2/ log 3 / 2 d. 

So we have a feasible SDP solution on a subgraph G' of G , where the objective is at least fin/ 3 
(here n is the number of vertices in G and not G') and each surviving vertex i has value Xi in the 
range [/3/2, 77 ]. 

As xi < 77 for each i, and the SDP objective is at least fin/3, the number of vertices n! in G' 
satisfies n' > (/3n/3)/r] = Q(n/3/r]). Moreover, as Xi > fi/2 for each vertex i € G', and the SDP 
does not put more than one unit of probability mass on any clique, it follows that G' is K r - free for 
r = 2/fi = log 3 / 2 d. Applying Theorem 3.1 to G' with parameter r = log 3 / 2 d, we obtain that G' 
has an independent set of size 


= n 




= n 



The SDP objective for G was fin , so the integrality gap is 0{dr / i / 2 ) = O(d( lo f^ d ) 3 ^ 2 ). ■ 
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3.1 An upper bound 


We give a simple construction that a(G ) < for r > logd. We use the standard lower bound 

R(s,t) = for off-diagonal Ramsey numbers for t > s. Setting t = r with r > logd, it follows 

that there exist AT r -free graphs H on d vertices such that a(H) = 0(log d/ log r). Now set G to be 
n/d disjoint copies of H. 

4 An Algorithm using Lift-and-Project 

In this section, we briefly illustrate how to make Bansal’s argument about the integrality gap of 
the lifted SDP [Banl5] algorithmic. Consider the relaxation on G, and let sdp(G) denote its 

value. We can assume that 


sdp(G) > n/log 2 d, (8) 

otherwise the naive algorithm already gives a d/log 2 d approximation. 

Let r] = 3 log log d/ log d, and Z denote the set of vertices i with Xi > r/. We can assume 
that \Z\ < n/(41og 2 d), otherwise applying Theorem 2.1 gives an independent set of size fl(|Z| • 
d 2r] /(d^/lo g d)) = fl(n log 2 d/d). Applying Theorem 2.1 is fine, since our solution belongs to SA + 
and hence is a valid SDP solution. Hence, 

sdp(G) < \Z\ ■ 1 + (n — \Z\) ■ r) < (n/(41og 2 d)) ■ 1 + n ■ rj = 2ijn. 

Let V' denote the set of vertices i with Xi € [1 /(4log 2 d),rj\. 

Claim 4.1 \V'\ > sdp(G)/(2r/). 

Proof: The total contribution to sdp(G) of vertices i with x% < l/(41og 2 d) can be at most 
n/(41og 2 d), which by (8) is at most sdp(G)/4. Similarly, the contribution of vertices in Z is at 
most \Z\, which is again at most sdp(G)/4. Together this gives sdp(G / ) > sdp(G)/2. As each vertex 
in V' has Xi < r/, the claim follows. ■ 

Lemma 4.2 The graph G' = GfW] induced on V r is locally k-colorable for k = 0(log 3 d). 

Proof: Consider the solution SA^ restricted to G'. For a vertex v € V\ let N{v) denote its 
neighborhood in G'. As |A(u)| < d and x\ > l/(41og 2 d) for all i € N(v), the solution 

defines a “local distribution” {Xs}scn(v) over subsets of each neighborhood with the following 
properties: 

(i) > 0 and J2scN(v) X s = b 

(ii) Xs > 0 only if S is independent in the subgraph induced on N(v), and 

(iii) for each vertex i £ N(v), it holds that 

Xi = ^2 xs > 1/(4 log 2 d). 

SCN(v):ieS 
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Scaling up the solution Xg by 4 log 2 d thus gives a valid fractional coloring of N(v) using 4 log 2 d 
colors, which by a set-covering argument implies that x(lV(u)) = 0 (log 2 d ■ logN(v)) = 0 (log 3 d). 


Using Johansson’s coloring algorithm for locally fc-colorable graphs (Theorem 6.1) we can find an 
independent set of G' with size 


alg(G') = 0 



_logd_\ 
log(fc + i)y' 


Using k = 0(log 3 d) and Claim 4.1 this implies an algorithm to find independent sets in degree d 
graphs, with an integrality gap of 


sdp(G) sdp(G) / dr] log (A; + 1) \ ~ / d \ 

alg(G') “ alg(G') “ V log d ) \\og 2 d)' 

Our algorithm only required a fractional coloring on the neighborhood of vertices. Since they are 
at most 2 d independent sets in each neighborhood, there are at most n ■ 2 d relevant variables in our 
SDP. Hence, we can compute the relevant fractional coloring in time poly(n) • 2°^. 


5 LP-based guarantees 


We prove Corollary 1.4. Consider the standard LP (2) strengthened by the clique inequalities 
Yli&c xi < 1 for each clique C with \C\ < logd. As each clique lies in the neighborhood of some 
vertex, the number of such cliques is at most n ■ ( l 0 g d ). Let j3n denote the objective value of this 
LP relaxation. We assume that /3 > 2/logd, otherwise the naive algorithm already gives a d/logd 
approximation. 

Let Bq denote the set of vertices with x t < 1/logd = /3/2. For j = 1,... ,k, where k = log log d, 
let Bj denote the set of vertices with 27 € (2P~ X / log d, 2 J /log d\. Note that YlieBj x i = 

)3n — YlieB 0 x i — /5ri/2, and thus there exists some index j such that Yhi&Bj x i — fin/(2k). 

Let 7 = 2 J ~ 1 / log d; for each % € Bj, Xi G ( 7 , 27 ]. Since 37 > 7 for each i € Bj, the clique constraints 
ensure that the graph induced on Bj is K r -free for r = I/ 7 . Moreover, since Xi < 27 for each i £ Bj, 
IBjI > IT S- By Shearer’s result for A" r -free graphs we obtain 


a(Bj) = n 



7 l°g d \ = / f3n log d 

dloglogd J ^(loglogd ) 2 


This implies the claim about the integrality gap. 

A similar argument implies the constructive result. Let j3n denote the value of the SA relaxation. 
As before, we assume that /3 > 2/logd and divide the vertices into 1 + log log d classes. Consider 
the class Bj with j > 1 that contributes most to the objective, and use the fact that the graph 
induced on Bj is locally fc-colorable for k = (log d/ 2 J_1 • log d) = 0(log 2 d). As in Section 4, we can 
now use Johansson’s coloring algorithm Theorem 6.1 to find a large independent set. 
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6 Johansson’s Algorithm for Coloring Sparse Graphs 

For completeness, we give proofs for two results of Johansson [Joh96b] on coloring degree-d graphs: 
one about graphs where vertex neighborhoods can be colored using few colors (“locally-colorable” 
graphs), and another about K r -free graphs. 
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Theorem 6.1 For any r, A, there exists a randomized algorithm that, given a graph G with max¬ 
imum degree A such that the neighborhood of each vertex is r-colorable, outputs a proper coloring 
ofV(G) using 0(j4^1nr) colors in expected poly(n 2 A ) time. 

Theorem 6.2 For any r, A, there exists a randomized algorithm that, given a graph G with 
maximum degree A which excludes K r as an subgraph, outputs a proper coloring of V ( G ) using 
0(i^(r 2 + r liiln A)) colors in expected poly(n) time. 

We emphasize that Johansson’s manuscript contains proofs of other results and extensions, such 
as colorability under weaker conditions than above, and extensions to list-coloring; we omit these 
extensions for now. Our presentation largely follows his, but streamlines some of the proofs using 
techniques that have developed since, such as concentration bounds for low-degree polynomials of 
variables, and dependent rounding techniques. Roadmap: we first give the intuition in Section 6.1. 
We give the proof of Theorem 6.1 in §6.3—§8.5, and then show how to extend it to K r -iree graphs 
in §9. 


6.1 Overview and Ideas 

Johansson’s algorithm for locally-colorable graphs uses the “nibble” approach: in each round, some 
6 > 0 fraction of vertices get colored from their currently-allowable colors. The goal is to argue 
(using concentration of measure, and the Local Lemma) that the degree of each surviving vertex 
goes down exponentially like (1 — O(0)) t , whereas the number of colors does not decrease very 
fast. This means that after « (e/0) In A rounds the degree of the remaining vertices is be smaller 
than A 1_£ before running out of the prescribed number of colors, at which point even the naive 
greedy algorithm can color the remaining vertices with a few more colors. The proof for the 
degree reduction uses concentration bounds for quadratic polynomials of random variables. The 
real challenge is to lower-bound the number of remaining colors. Johansson’s argument shows that 
the entropy of the probability distribution of a vertex over its colors remains high throughout the 
process, and hence there must be many colors available. This requires a carefully orchestrated 
process, which we describe next. 

In more detail (but still at a high level): in each round, some 0 ~ A -1 / 4 fraction of the vertices 
get activated, and each tentatively chooses a color from its own probability distribution. (This 
per-vertex distibution is initially the uniform distribution.) Any vertex that gets the same color as 
its neighbor rejects its color; since the number of these is small, we can ignore these for now. Then 
each tentatively colored vertex (say v with color 7 ), with probability \ accepts color 7 permanently 
and deletes the probability mass corresponding to 7 from its neighbors (so that they cannot take 
color 7 ); with the remaining probability v rejects color 7 for this round and waits for another 
round. In order to ensure the total probability mass at each vertex remains about 1, since the first 
option caused the probability mass for color 7 to decrease at the neighbors, the second option must 
increase color 7 ’s mass at the neighbors. If two of these neighbors u, w are connected by an edge, 
this means that we’re increasing the chance that both these will get color 7 ; this is potentially 
worrisome. 

This problem does not arise if the graph is triangle-free, because there are no edges in the neigh¬ 
borhood of any vertex. In this case Johansson’s previous preprint [Joh96a] argued that the entropy 
of each vertex’s distribution remains high—itself a clever and delicate argument (see [MR02, Chap¬ 
ters 12-13]). However, if we just assume that the graph is locally r-colorable, the existence of edges 
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in node v’s neighborhood means that the probability mass for a color at both endpoints of an edge 
may become higher, creating undesirable positive correlations. What Johansson’s new proof does 
is simple but ingenious: it “reshuffles” the measure for the color randomly to some independent set 
in the neighborhood. This is where the r-colorability condition kicks in: since there are large inde¬ 
pendent sets (of size A jr) in each neighborhood, the reshuffling does not change the probabilities 
too suddenly. Now carefully applied concentration bounds and LLL show a similar behavior as in 
the triangle-free case, and proves Theorem 6.1. 

The argument for K r -free graphs requires a more involved recursive reshuffling operation: in this 
case the size of the independent sets may be too small (if we just use Ramsay’s bound, for instance), 
so the idea is to move the measure (on average) to sets that avoid Kt for t smaller than r. This 
process (which Johansson calls a “trimming modifier”) creates a very slight negative correlation on 
the edges, but this suffices to show Theorem 6.2. 

6.2 Notation and Preliminaries 

We now define some notation and concepts, and give properties useful for the following proofs. We 
will interchangeably use u ~ v and u € N(v) to denote that u and v are adjacent. 

6.2.1 Mean-One Random Variables 

An r.v. X is a mean-one random variable (m.o. r.v.) if X only takes on values in {0} U [l,oo), and 
E[A] = 1. One simple class of m.o. r.v.s take on some value c > 1 w.p. 1/c, and 0 w.p. 1 — 1/c. 

6.2.2 The Stopped Product 

Given a sequence of non-negative random variables Yi, Y 2 , • • •, Y m , and a “threshold” value a > 0, 
define a stopping time r a as 

T a ■■= min jf | FI i<t Y i > °| 

Then the stopped product WjYi is defined as 

' 0 j<min(m,r a ) ^ 


6.2.3 The k and 7i Functions 

For a random variable X, define the function 

k(X) :=E[XlnX], (9) 

The following facts are easy to verify, and will be useful in calculations. 

(a) If X , Y are independent, then k(XY) = k(X) E[T] + k(Y ) E[A], 

(b) Hence if X,Y are independent m.o. r.v.s, then k(XY) = k(X) + k(Y). 

(c) Also, n{aX) = E[X](alna) + aK,(X). 

(d) For an event £ and the associated m.o. r.v. X = i$,<X) = ln(l/Pr[£]). 

(e) For a stopped product X = UXi (as defined in §6.2.2) of independent m.o. r.v.s with respect 
to some threshold o, 

«(*)<£i «(*<)■ 
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(f) If X = (1 — 1(£)) + 1(£) • Y, and Y is independent of the event £, then k(X) = Pr[£] k(Y). 

It is also useful to define k(X) as an absolute upper bound on X: 

k(X) := inf{c | X < c a.s.}. (10) 

6.3 The Algorithm for Locally-Colorable Graphs 

Let us present the algorithm for Theorem 6.1 about finding colorings of r-locally-colorable graphs. 
The proof follows in §7. 

Let s = O(A^r) be the number of colors we are aiming for, and L is the set of s colors. For vertex 
v, let V v = {u} x L be a collection of tuples indicating which colors are still permissible for v. The 
term whp A denotes “with probability at least 1 — 1/ poly (A)”. 

We follow the algorithmic outline from the overview n §6.1. The algorithm starts with each vertex 
v having a uniform probability distribution p°(v, 7 ) = 1/s over the colors 7 € L. In each stage t 
we pick some vertices from the current graph G t and color them based on the current values of 
7 ), then update the probability distributions of the other vertices to get p t+1 (v, 7 ), drop the 
colored vertices to get G t+1 , and proceed to the next stage. (This is the so-called “nibble”.) The 
goal is to show that after sufficiently many stages we have a partial proper coloring using at most 
s colors, and the degree of the graph induced by the yet-uncolored vertices is A 1-£ . We can then 
use a greedy algorithm to color the remaining vertices. 

The process in a generic stage t is as follows (we drop superscripts of t to avoid visual clutter). 

1. Let p S (0,1) be a threshold to be defined later. Define 

Pa(v, 7 ) := P{v, 7 ) ' !(?(«, 7 )<p) (H) 

Pc(v, 7 ) ’=Pa{v, 7 ) • ifE^phvy^iooinA)- (12) 

Hence p a zeroes out any (vertex, color) tuple (v, 7 ) which has a high value, and p c additionally 
zeroes out (^, 7 ) when u’s neighbors have a lot of probability mass on color 7 . 4 

2. For each vertex v in the current graph G and each color 7 in L, independently flip a coin 
with probability 9p c (v, 7 ). (The parameter 6 € (0,1) is defined later.) Let A(„ i7 ) be this 
indicator variable. If A(„ 7 ) = 1 then color 7 is tentatively assigned to v. Many colors may be 
tentatively assigned to v. 

Also, let r\_/ Bin(l/2) be an unbiased coin-flip independent of all else. 

3. For v € V, let T v = {(v, 7 ) | ^-( v . ri ) = 1}, and let T = U„7), be all the tentatively assigned 
tuples. Let 

C v {(^jT) I 1 A ^(d i7 ) 1 A -A( U;7 ) OVu ~ u}, 

and let C = U V C V similarly. Note that the pair (v,j) G T is dropped from C if any neighbor 
of v is tentatively assigned color 7 (this ensures proper colorings), or if its own coin-flip ??(„ )7 ) 
comes up tails (this gives us a “damping” that is useful for the rest of the process). 

4 The definition of p c is not required to prove Theorem 6.1, but is useful in extending the result to K r - free graphs. 
The reader only interested in the former result should think of p c = p a for this discussion. 
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4 . For v € G, if there exists some 7 such that (17,7) € C, then color v with an arbitrary such 7 
and remove v from G' . So the events {u € G'} = {C v = 0 }. 

Now comes the changing of the probabilities via the “modifiers” (which are simply mean-one r.v.s, 
as defined in §6.2.1). 

5 . For each pair (re, 7), generate modifiers M'^ for all v € N(w ): 

• If (w, 7) 0 T (i.e., A(„, )7 ) = 0) then = 1 for all v € N(w). 

• Else, if (re, 7) € T, then for all v € N(w), 

-^(11,7) ^(l V(w, r y)) ' r 1 -(v£S(w, 7)) 


where S(w,j) C N(w) is a random color class from an r-coloring of N(w), this random¬ 
ness is independent of all other and (u/, ■). 

6 . For each pair (v,j), collect modifiers MF x from its neighbors re ~ v, and define := 

. Here fj is the stopped product (as in §6.2.2) w.r.t. threshold p/p(v, 7). 5 Finally 
set the probability values for the next stage to be 

p{v, 7 ) := p(v, 7 ) • M ( „ )7) (14) 

Recalling the k function from (10), observe that k := imK„ w /«(MLj) = 2r; hence the 
stopped product ensures p'(v, 7 ) < k ■ p = 2 rp; define p * = Kp. 

We’ve now finished defining the new probabilities p t+1 (v, 7 ) := p'(u, 7 ) and the new graph G t+l = 
G', and proceed to the next stage. This is done for some T = 0(|lnA) many stages, after which 
we claim that the degree of G T becomes <C s, and a naive coloring suffices to color the rest of the 
vertices. 

The run time: the most expensive step is to find an r-coloring of the neighborhood of the vertices. 
For each vertex this can be done in time 0(r ■ A • 2 A ) using 0(2^) space (see [BHK09]). That is 
followed by T = O(^p) rounds of partial colorings, each taking poly(nA) time. Hence the total 
runtime is 0{nr ■ 2 A ) + poly(n). 

7 The Proof of Theorem 6.1: I. The Setup 

The analysis of this coloring algorithm is similar in spirit (but more technical than) Johansson’s 
previous result for coloring triangle-free graphs of maximum degree A. Although that result also 
appears only as an unpublished manuscript [Joh96a], a lucid presentation appears in the book by 
Molloy and Reed [MR02]. 

The idea is clever, but also natural in hindsight: as the process goes on and probability values p(v, 7 ) 
increase, we want to show that for each surviving vertex, its degree goes down rapidly, whereas 
it has many colors still remaining. Showing the former, that many vertices in each neighborhood 

5 The stopped product is with respect to a sequence, so let us assume a total order on the vertices, and the variables 
{A*(“ l7 )} t o6JV(u) are considered in this order. 
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are colored at each step, proceeds by showing that for each v, there is not too much positive 
correlation between the colors of its neighbors, and they behave somewhat independently. To show 
the latter, that many colors remain valid for each vertex v, we show that the entropy of the p(v, •) 
“probability distribution” remains high. (The quotes are because we only have ^ 7 p(u, 7 ) ~ 1, 
and not equal to one, but this approximate equality suffices.) And lower-bounding the entropy, as 
in [Joh96a, MR02], relies on upper-bounding the “energy” of the edges which captures the positive 
correlation between the colors of its endpoints, and is defined as follows: 

Definition 7.1 (Energy) For an edge uv the energy with respect to the p values is 

£(uv,p) := ^p(u, 7 )p(u, 7 ). 

7 

For a non-edge uv, define £(uv,p) := 0. For a graph G, the energy of a vertex u is Cc(u,p) := 
YlveG £( uv ’ p)> w ^ ien the graph is clear from context we drop the subscript and use just £(u,p). 

7.1 The Parameters 


For ease of reference, we present the parameters used in the proof here. Some of these have already 
been used in the algorithm description, the others will be introduced in due course. 


e : = 1/100 


6 ■= /sr l / A+2e 


k := maxft(lWT -,) = 2 r 

iu,i>,7 ' ’ 1 ' 

a := 1/2 -3e 

C : = max (l, In( k(M$ i7} ))) = hi(2r) 


P ■= 


c : = 


T : = 


^-3/4-5e 

0 


2e 
a ■ 6 


In A 


s := I L 


A 

K 


p* := np = 2 rp 


b := a — c 

(b/ 4)e In A 


K := 


c 


We will assume that r < A e = A 1 / 100 , else the desired coloring number of O(j^lnr) will just 
be 0(A), which is trivial to achieve. Finally, we will assume that A is large enough whenever 
necessary. In particular, 

lnA> 10000 (15) 

suffices for the rest of the analysis, however no attempt has been made to optimize any constants. 

7.2 Initial Values 

At the beginning, the probability values are p°(v, 7 ) = 1/s for all (^, 7 ), and the degrees are at 
most A. Hence 

P°(V v ) = 1 (P0) 

d{v,(?)< A (DO) 

CgoKp 0 ) = ^ p°( v ,'y)p°{u,l) < A • s • l/s 2 = K (E0) 

7 ,u~v 

h(v,p°) = ^p°(u, 7 ) In 1 /p°(v, 7 ) = Ins = In A — In IF (HO) 

7 
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7.3 The Invariants 


The proof is by induction over the stages. We maintain the invariants that for all t < T, for all 


v € G 4 , the following hold true: 

p\V v ) €l±fApCl± y/9 C1±eC [1/2, 2] ((InvP)) 

d(v, G l ) < Ae~ adt + t\ D ((InvD)) 

&*(«,?*) < Ke~ m + ^ < 2 K ((InvE)) 

h{y,p l ) > h(v,p°) - eC^cAvM - t\ H 

t'<t 

> (1 — e) In A ((InvH)) 

where 

A p = 0(^p* In A) A d = 0(K6 2 A) 

A e = O(0A - 2 e lnA) A h = 0{\Jp* In 3 A) (16) 


In the following, we will show that the tightest bounds hold for each t. The weaker bounds given 
above are just implications useful in our proofs, and in all cases these follow by algebra. Observe 
the initial values from §7.2 satisfy these invariants. 

8 The Proof of Theorem 6.1: II. The Inductive Step 

We assume the invariants hold for all times upto and including time t, and now want to show these 
are satisfied at the end of stage t + 1. As usual, we use p to denote p t , and p' to denote p t+1 . 
The plan is to show that for each fixed vertex v, each of the invariants hold with high probability. 
Since we cannot take a union bound without losing terms dependent on n, we show that the failure 
events depend only on a small number of other failure events, whereupon we can apply the Lovasz 
Local Lemma to complete the argument. 

In the following arguments, we assume that d(v) > for all v € Gt- Indeed, at the beginning of 
any stage we can repeatedly remove vertices with degree less than and having found a coloring 
for the rest of the vertices, we can color these removed vertices greedily at the end. 

8.1 The Total Probability 

By construction, the probability values p t {v, 7 ) form a martingale, and hence it is not surpising 
that their sum remains concentrated around 1. Here is the formal proof. 

Lemma 8.1 For vertex v, if p(V v ) € 1 ± e, then p'(V v ) = p(V v ) ± 0(y/p * In A) whp A . 

Proof: By construction of the modifiers as mean-one r.v.s, £[£>'(?;, 7 )] = p(y, 7 ) -E [Mr v ^] = p{v, 7 ) 
for all (^, 7 ). Moreover, p'{V v ) is the sum of independent p*-bounded random variables, with 
~K[p'(V v )\ < (1 + e) < 2, Theorem A.l implies that the deviation \p'(V v ) — p(V v )\ is at most 
A p := 0(^p* In A + p*ln A) = Q(y/p* In A) whp A . ■ 
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8.1.1 The Lovasz Local Lemma Argument 

To get the property of Lemma 8.1 for all vertices v simultaneously requires the LLL, since we 
cannot take a union bound over all the n vertices. For this, define the bad event By = {lo : p'(V v ) > 
p(V v ) + Ap}. Note that this event depends only on a subset of the variables {A^ u ^, r l{u,Yi 
for u € {w} U N(v) and 7 € L, and all these A, 77 , M variables are independent. Hence, By and 
By; are clearly independent if N(u) n N(v) = 0, and the dependency graph has degree at most 
A 2 s < A 3 . Using the Moser-Tardos framework [MT10] we can get a coloring where none of these 
bad events happen. 

For each of the the other invariants, we use a similar approach using the LLL: we define “local” 
bad events — i.e., the bad event at a vertex will depend only on r.v.s for vertices at some constant 
distance from it — and hence the degree of the dependency graph over the bad events will be 
bounded by poly (A). Moreover, the probability of each bad event will be at most 1/poly (A). 
Using the Moser-Tardos framework will allow us to find an outcome that will avoid all the bad 
events at all vertices simultaneously. Since the arguments will be very similar, we will henceforth 
just explain what the bad events are, and omit the details. 

8.2 The Degree 

Lemma 8.2 For vertex v, whp A the new degree is 

d'{v) < (1 - ad) d{v) ± 0(K9 2 A). 

It is more convenient to study X u , the indicator of whether u was colored in this round; let 
X = Y! u ~ v X u - Then d'(y) = d(v) — X. We will first show that E[A] is large, and then that X is 
concentrated around its mean. Let 

Y u 1 ~ — A(u,'y) r l(u,'))) ( 17 ) 

7 

indicate whether u was tentatively assigned at least one color that was not dropped due to the rj 
coin flip; clearly X u < Y u . Moreover, let 

X u ^2 ^2 7k u >7)^h">7)' ( 1 ®) 

7 W~U 


It is easy to see that Y u — Y^ < X u . 

Claim 8.3 Pr[u is colored} = E[X U ] > (1 — e) 9 > (a + e)9. 

Proof: By inclusion-exclusion, 

E[Fu] > Yj ^Pc(«, 7) “ ^-Pc(u,7)Pc(«, Y) = ^Pc(V u )U - ^Pc(V u )Y 

7 7 , 7 ' ^ 2 

Using (InvP), (InvE) and (InvH) in Lemma A.5, we know that p c (V u ) > 1 — 6 (VO + 9) — 2e > 
l—12\f9—2e. Also, by (InvP), p c (V u ) <p{V u ) < 2, so by algebra we get E[W] > 0(1/2— e)—O(0 3 / 2 ). 
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Moreover, 


E K] = ^^2^2 Pc(u,'y)p c (w,^) = 9 2 £(u,p c ). 

7 W^lL 

Using that £(u,p c ) < £(u,p) < 2K for all u (from (InvE)), we infer E[Y^] < 2 K9 2 . Since K9 2 = 
0(9 3 / 2 ), 

Pr [u colored] = E[X U ] > E[y u - Y^\ > 9( 1/2 - e) - 0(9 3 / 2 ) > (± - 2e) 9 
The final inequality holds for large enough A (15), and the claim follows using the definition of a. 


Corollary 8.4 Efd^u)] < (1 — aQ) ■ d(v) 

Proof: Suppose X := Yl u ~vX u is the expected number of neighbors of v that get colored; by 
Claim 8.3, we get that 

E[X\ > ^E[A n ] > d(v) ■ a6. 

IL~V 

Finally, observing that d'(v) = d(v) — X completes the proof. ■ 

8.2.1 The Concentration Bound for Degrees 

We want to show that X — E[JA] is small whp A . Let Y := Yh u ~v an d T' := For ^he 

upper tail, observe that as Y — Y' < X < Y, we have 

X - E[X] <Y -E[Y - Y') = (Y - E[y]) + E[y']. (19) 

But y is the sum of independent {0, l}-valued r.v.s {Y u } u ~ v , and by inclusion-exclusion again each 
E [T U ] < E 7 fp(n,7)- Thus 

E[y] < 9/2 EE Pc(u, 7 ) < 0(9d(v)) = 0(9A). 

7 

By the tail bound Theorem A.l, setting A^ := 0(V9A In A + ln A) suffices to give Pr[|y — E[y]| < 
A^] < 1/poly(A). Plugging this into (19) and using the bound of E[Y'] = T,(] < 2K9 2 A =: 

\ ( '2 gives us that the total deviation of X above its mean is at most A^ + A^ = 0(K9 2 A) whp A . 
For the lower tail, observe that 

E[X] — X < E[Y] - (Y - Y') = (E[y] - Y) + E[y'] + (Y' - E[y']). (20) 

Since E[y] — Y < A^ and E[y'] < by the preceding argument, so we focus on bounding the 
upper tail of Y'. For this we use the concentration inequality for polynomials from Theorem A.2. 
The parameters are: 


Ho = E[y'] < d(v) • 2 K9 2 < 2K9 2 A (21) 

pi < 2 max E[A( n 7 d = 2maxy^ 9p c (u, 7 ) < 29pA = 0(A~ 3e ) < 1 (22) 

P2 = 2. (23) 


21 



Plugging this into Corollary A.3 of the aforementioned concentration inequality, we get that whp A 
the deviation | Y' — E[y']| is O(V2K0 2 A In A + In 2 A) =: A^. Substituting into (20), we have that 

whp A , 

E[X] — X < A^ } + A® + \ { n ] = 0{K9 2 A) =: X D (24) 

This proves Lemma 8.2. 

Finally, the LLL part: here the bad event = {oj : d!(y) > (1 — a9)d(v) + A£>)}, and this depends 
on the A, r], M r.v.s for vertices at distance at most 2 from v (since it depends on whether u £ N(y) 
survive, which depend on their neighbors). Hence the dependency is at most A 4 x s. 

8.3 The Entropy 

Now to show invariant (InvH), that the entropy of {p'(v, 7)} 7 remains high, where the entropy is 
defined as 

h(v,p') :=-^2p'{v, 7 )lnp'(n, 7 ) (25) 

7 

Lemma 8.5 E [h(v,p')\ > h(v,p ) — C6£(v,p a ). 

Proof: Recall that p'(v, 7 ) := p(v, 7 ) • from (14), where M/ v ^\ is a m.o. r.v.. 

h(v,p) = -^2p'(v, 7 ) lnp'(v, 7 ) 

7 

= -^2p(v,'y)M(v , T ) lnp(n, 7 ) - ’^2p(v,'y)M( VtJ ) InM (t);7) 

7 7 

Hence, taking expectations, 

E [h(v, jf)\ = - ^2 E[M ( „ i7) ] p{v, 7 ) In p(v, 7 ) - p ( v ’ ^ E I M 0 , 7 ) ln M («, 7 )] 

7 7 

= h(v,p ) - 5^p(u, 7)E[M (W)7) lnM (W(7) ] 

7 

In fact, if the probability p(u, 7 ) is greater than p for some 7 (i.e., if p a (v, 7 ) >0), then the definition 
of the stopped product implies that M(„ 7 j = 1. Hence, we get the stronger claim that 

E [h(v,p')\ = h(v,p ) - '^2p a (v, 7 ) E[M( Vj7 ) 1 hM( B i7 )] (26) 

7 

Now, recall that for an r.v. A, we dehned k(A) = E[AlnA] in §6.2.3. 

Claim 8.6 k{M^) = E[M ( „ i7) lnM (lvy) ] < 0C Y,w~vPc( w ’ 7)- 

Proof: Observe that M(„ i7 ) is a stopped product of a bunch of m.o. r.v.s ^ of neighbors 
re ~ v, each of which is either 1 (if ( 10 , 7 ) 0 T) or itself a product of two m.o. r.v.s as in (13). Using 
properties of the k(-) function from §6.2.3, we get 

«( m 0, 7 )) < 5^(^PcK 7)) ‘ maxft(M“ 7 ) (27) 


22 



Using the definition of C gives us the claim. 
Substituting Claim 8.6 into (26), 


E [h(v,p')\ > h(v,p) — 6C EE Pa{v,i)p c {w^) > h(v,p) -0C£(v,Pa). (28) 

7 Wr^V 


This proves the lemma. ■ 

Lemma 8.7 The deviation \h(v,p') — K[h(v,p')]\ is at most \/p * In 3 A whp A . 

Proof: By definition, the entropy h(v,p') is a sum of independent r.v.s p'(v, r y) In-rr -— since 
p'(v, 7 ) G [1 /s,p*], these r.v.s are [0, m]-bounded where m := (p*lns) < p* In A. Moreover, the 
mean p satisfies p < In A because the entropy can be at most Ins < In A. Hence by the tail 
bound from Theorem A.l, the deviation from the mean is at most A h = 0(\J pm In A + min A) = 
\Jp * In 3 A whp A . ■ 

For the LLL application, since the entropy depends only on the p'(v, 7 ) values, the bad event for 
v is dependent only on the random choices of {u} U N(v), and hence easily handled for the usual 
reasons. 

8.4 The Energy 

The calculations above show that the decrease in entropy at vertex v depends on the energy of 
edges incident to v , so it remains to show that this energy is small (and in fact, decreases over 
the course of the algorithm). This is technically the most interesting part of the analysis. We’re 
interested in 


Ca(v,Pa) = ^ €(uv,p a ) , (29) 

U ° V E 7 Pa(«,7)PaU,7) 

and want to show that this energy drops by a factor of ~ (1 — 0/2) each time. Formally, we prove 
the following: 

Lemma 8.8 For a vertex v, whp A the energy 

£,G'(v,p' a ) < (1 - b6) ■ Cc{v,Pa) + 0(0 A~ 2e lnA). 

As for the other invariants, we first bound the expectation and then show a large-deviations bound. 
The expectation calculation itself proceeds via two claims — the first claim quantifies the change 
in energy due to considering the new probability distribution p' a instead of p a (keeping the graph 
G fixed), and the second captures the change due to considering graph G' instead of G (but now 
keeping the distribution p' fixed). Remember that the energy of non-edges is zero by definition. 

Claim 8.9 For any v, E ue G E ^( w >Po)] ^ E u eG €( uv > P*) ■ 

Claim 8.10 For any v, E [E ue c' ~ ad )J2ueG E i^ uv ^P'a)\- 
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In the latter claim, observe that G' is a random variable itself, and hence we cannot just push the 
expectation inside the sum. Before we prove Claim 8.9, we will use the following observation which 
allows us to consider the unstopped product instead of the stopped product. 

Fact 8.11 For anyu : vG G, and 7 € L, pf a {u,i)j/ a {v, 7 ) < p a (u, j)p a (v, 7 ) Yl w ^u M ( u ,'y) rL~ u M («, 7 ) 

Proof: Indeed, the products are stopped only when the terms multiplied give us value at least p , 
but then corresponding p' a values get zeroed out. ■ 

Proof of Claim 8.9: Using Fact 8.11, we can replace the stopped product by the usual ones: 

n M <U) n M k-» 


< p(u, 7 )p(u, 7 )E 

n 

E 

n M l»a) 

E 



w~u:w'/'V 


Xr^V.X'/'U 


w:uvwG/S. 


But the first two expectations equal to one, since each of these is a product of independent m.o. 
r.v.s. For the third one, with probability (1 — 9p c (w, 7 )) the pair (w, 7 ) 0 T and we get 1, else 
with probability 9p c (w, 7 ) the pair (w, 7 ) € T and the random color class modifier gives zero. In 
all cases the final expectation would be at most one. ■ 

Proof of Claim 8.10: In this claim, it suffices to bound, for each color 7 , E \p' a (u, 7 )p' a (v, 7)1 (ueG')] • 
The crucial observation is that for any 7 , the events 

{u g G'} = {C u = 0} (30) 

= {Cu c {(u, 7 )}} - {C u = {(«, 7 )}} (31) 

= {C n (M x ( L \ {7})) = 0 } - = {(u, 7)}} (32) 

However, when C u = {(u, 7 )}, then Ar Uj y\ = 1 and rj(u, 7 ) = 1, which means that p' a (v, 7 ) = 0 due 
to the modifier MJ‘ ,, and hence E[Pa( 7 i > 7 )l , a( v ) 7 )l(c u ={(u, 7 )})] = 0- Moreover, the former event 
says that u did not get any color from the set L \ 7 , which is independent of all decisions for the 


color 7 . Hence 

E [l£(“>7)Pa( u »7)l(ueG')] = E KO, l)p' a {v,l)} ■ Pr[C n ({u} x (L \ { 7 })) = 0] (33) 

Using (32) again, we know that 

Pr[C n ({ u } x (L \ { 7 })) = 0] = Pr[u G G'] + Pr [C u = {(«, 7 )}] (34) 

< Pr[u € G'} + Pr[H (Uj7) = 1] (35) 

< (1 - (o + e)0) + Op, (36) 

where the first expression is from Claim 8.3 and the second one from p c (u, 7 ) < p. But 9p < e9 for 
large A. Thus the claim is proved. ■ 

Combining Claims 8.9 and 8.10, and using a = b, 

E [£ G'(v,p ' a )] = ^ E l£(uv,p' a )\ < (1 -b9)^2^{uv,p a ) = (1 - b9) £g{v, Pa)- (37) 

ueG' u&G 

We next show that the r.v. £ G'{ v iP ' a ) concentrated around its mean. 


E ba(“>7y a (w,7)] < Pa(«,7)Pa(w,7)E 
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8.4.1 The Concentration Bound for Energy 

Fix some v € G. We want to show that YIugG ^{ uv iP'a)^-(ueG') is concentrated around its mean. 
The idea is simple: denoting Q u = £(uv,p') and R u = 1 ( ne G')> and letting q u = E [Q u ],r u = E[f? u ] 
being their expectations, the triangle inequality gives us that 

E Q U R U - E[Q„«„]| < 152(Q„ Qu) Ru | T | ^ Qu(Ru t’u) | T | y ^ Q u r u | 

u u u u 

The following three claims now bound the three expressions on the right. 

Claim 8.12 IX we G~ X ue G< 0(6>A~ 7e lnA) whp A . 


Proof: The left hand side is at most Y1ugn g (v )\^( uv > Pa) ~ E[£(«u,p(j)]|. Each £(uv,p' a ) is a sum of 
independent p 2 -bounded r.v.s, and hence deviates from its mean by at most 0(y / E[^(uu,p^ 1 )]p 2 In A+ 
p 2 In A) whp A by the tail bound in Theorem A.l. Summing this over all u € Nq(v) (and taking a 
union bound over these A events), we get the total deviation whp A is 


Y °(pV' E ‘[t.{ uv ,P'a )]lnA + p 2 In A) 

uGN g (v) 


< o( Win A • v/A 


\ 


E 


Y t( uv ’P') 

~ u£N g (v) 

< 0(pVAK In A + p 2 A In A). 


+ 0(p 2 A In A) 


The first inequality uses Cauchy-Schwarz; the next one uses the expectation bound from Claim 8.9 
and invariant (InvE). The dominant term is 0(pVAK In A) = 0{6A~ 7e In A), which completes the 
claim. ■ 


Claim 8.13 IE u6 G E [£(“ u >Pu)] 1 (ueG') - X«eG E K( w >l'a)] IE [ 1 («eG')]l < O(0A 2e lnA) whp A . 
Proof: By Claim 8.9, each term 

E [€(uv,p' a )\ < £(uV,Pa) = ^ p Y Pa ( V ’^ ~ 2 P■ ( 38 ) 

7 7 

Using c u := E[^(uu,p' a )] and observing that c u = 0 for all u 0 Nq(v), we want to bound the 
deviation 

IX ue iv G (<;) ^ (l(neG') - Pr[w € G'])|. 

The argument now follows that of §8.2.1, with the only difference that the variables are in [0,2p] 
rather than {0,1}, which merely multiplies the deviation from the mean by a factor of 2 p. Hence, 
whp A , we have 

IX„eiV G (t,) c ul(ueG') ~ Hugn g (v) c «Pr[u G G’]\ < O{0 2 KpA) = 0(6A~ 2s In A). 

This proves the claim. ■ 

Claim 8.14 IXneG E [£(«w.^)] E [l(ueG')] ~ EueGEK^^JifueG')]! ^ °( 9 A” 1 / 2 ) whp A . 
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Proof: Using (33) and (35), summing over all colors 7, and using the fact that Pr[A ( u ^ = 1 ] < Op 
for all 7, 

E [£(^Pa)] E [l ( ueG')] < H£,(uV,Pa)l(ueG')} < E [£(^,Pa)] ( E [ 1 («eG')] + °P) ■ 

Rearranging, summing over all u € G, and using that E [£(uv,p' a )\ < 2 p by the calculation in (38), 


0<^E[(Kp' n )l 

(ueG')\ ~ E K(« w »^a)] E [ 1 (ueG')] ^ HZ(uv,p' a )\9p (39) 

u£.G u€:G U^G 

< A ■ 29 ■ p 2 . (40) 

This is at most 0(9 A -1 / 2 ), which proves the claim. ■ 

Putting Claims 8.12-8.14 together, we get that whp A 

J^CCuu.Po^ueG') < E [?(™)Pa) 1 (ueG')] + 0(9A~ 2e In A) (41) 

u£G u£G 

< (1 - b9) • £(uv, Pa) + 0(9A~ 2£ In A) (42) 


where the latter inequality follows from Claims 8.9 and 8.10. 

Finally, for the LLL, the bad events in this case are again dependent only on the random choices 
within distance 2 of v, which means the dependency is 0(A 4 s). 


8.5 Behavior after T rounds, and Maintaining Invariants 


In the previous sections, we showed that if the invariants (InvP)-(InvH) held at the beginning of 
a specific round, then using the LLL we can ensure that they hold at the end of the lemma, with 
some additional loss. Using this, we now show that the bounds we have derived suffice for the 
invariants hold at each round in [1..T]. For all these bounds we use that A is large enough (15). 

• Probability. After each round, the probability value may increase by Ap. This means after 
T rounds, 

p\V v ) < 1 ± TAp = 1 ± 2gln f A) In A • 0A^/ 8 - 4e \/RA = 1 ± y/0. 

a9 

• Degree. In each round, the degree falls by a multiplicative factor of (1 — aQ), but potentially 
increases by an additive term of Ap. This means that after t rounds, the degree can be 
(wastefully) bounded by 

d(v, G t ) < (1 - a9y ■ A + t\ D < e~ adt A + 0(K9 2 At) 

Now for t = T, we get that 

d(v, G t ) < Ae ~ 2elnA + 0(K9A) < A l ~ £ . (43) 

• Energy. Again the energy falls by a (1 — 60) factor but potentially increases by an additive 
term of Ap. This time we use a slightly better bound 6 of 

Z( v ,pi)<(i-be) t z(v,p 0 a ) + ^ (44) 

< e~ m K + 0(A~ 2£ In A) < 2 K. (45) 

®Given a system xt+ 1 < axt + ft for a < 1, we know that xt < 0**0 + 
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Entropy. From Lemmas 8.5 and 8.7 we get that 


h{y,p l ) > h(v,p°) - 0C^2^(v,Pa) - t\ H 

t'<t 

> (In A-In K)-eC^2 (V - be)*' £,{v,p° a ) + - tyj p* In 3 A 

t'<t ' ' 


> 


(In A - In K) - DC + *jfj - 


A 


> (In A — In K) - 


KC tC 6 • 0(A~ 2e In A) 


— t\J p* lm A 


For t < T = ^ In A, we get 


hivrf) > (In A - In K) - — - 2eln ^ 0( A" 2e In 2 A) - 0(A" 1 / 8 In 2 ' 5 A) 

b a ■ b 

> (1 — e) In A. 


By (43), the degree of all surviving vertices falls below A 1 ” 6 after T rounds, whence we can color 
them using A 1-e more colors. Hence the total number of colors used is s + A 1 ^ 6 = O(-j^) = 0(^^). 
It remains to bound the parameter C: since the m.o. r.v.s either take on value 0 or 2r, we 

know that = ln(2r) = In2 + lnr. This means the number of colors used is 

o(-\ in 
Yin A 

This completes the proof of Theorem 6.1. 

9 K r - free Graphs 

The above analysis was tailored for graphs where each neighborhood is r-colorable. To color K r - free 
graphs, we use different modifiers which give a weaker guarantee of 0(j4^ ( r 2 + r In In A)) colors 
as claimed in Theorem 6.2. Since a coloring algorithm with s colors gives an independent set of 
size n/s, this result matches Shearer’s bound for independent sets for values of r € 0(lnln A). 

In this section, we will assume that r < d %/log A for some suitably small constant d\ for values 
of r > dyj Iog~A, the quantity 0(r 2 + r In In A) is hl(hiA), and the trivial A-coloring satishes 
Theorem 6.2. Again, we assume that A is a suitably large constant. 

9.1 The Algorithm 

The algorithm in this case is very similar in structure to that in §6.3. The only difference is in the 
modifiers: we replace Step (5) of that algorithm by the following: 

5’. For each pair (10,7), generate modifiers Md ^ for all v ~ w as follows: 

• If (w, 7) 0 T (i.e., A( Wi7 ) = 0) then = 1 for all v € N(w). 
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• Else, if (w, 7 ) € T, then let H = G[iV(u;)] be the graph induced on the neighbors of w. 
Use Theorem 9.2 on_this graph H, with c = |, and with values {p{v v& y ^h) ■ This 

generates modifiers Af// ^ for each v € V(H) = N(w), and then define 

m.o.r.v. m.o.r.v. 


We emphasize that we invoke the procedure in Theorem 9.2 once for each (re, 7 ) € T. A few 
comments on the new modifiers: 

• By Theorem 9.2(P3) and the fact that c=\, each modifier has 

k(M ( ^ 7) ) < 0(r 2 + r In (J2 v ~ w p{v, 7 ))) 

as long as (w,^) is a tentatively chosen pair; = 0 if (w, 7 ) is not tentatively chosen. 

But the pair (w, 7 ) is tentatively chosen with probability 6p c (w, 7 ), and p c (w, 7 ) / 0 implies 
that the probability is at most 100 In A. This implies that -j) < 0(r 2 + r In In A). 

• The maximum value of the modifer is given by Theorem 9.2(P2), and again using the bound 
on total probability of any neighborhood, we get that 

«(M ( - 7 ) )< 0 (lnAr- 0 (ir 2 . (46) 

If we define re := max Wi „ i9 re(MP ^), we can use this to again define p * := 2rep, just the 
value of re has changed. In the analysis for the r-local-colorability case, we used that p * was 
only greater than p by a k = 2r < A e factor. This time re is given by (46), but for values 
r < d'yj In A for some suitably small constant c!'. re is again A e and we can use p * < pA e as 
before. 


The runtime: Constructing the modifier requires poly(A) • r time and we construct a modifier for 
each vertex at most T times. Hence, this algorithm runs in 0(n ■ r ■ poly(A)) 

9.2 The Altered Parameters 


The analysis remains very similar, we just indicate the changes (in blue). The parameters change 
slightly from §7.1: we now set c := j (as mentioned in the algorithm description), and set b := 
a — c — e. This affects the value of K. Moreover, since the modifers MP ^ change, the value of 
C = max WtVt g k(M w (v, 7 )) becomes 0(r 2 + r In In A) by the discussion above. To summarize, here 
are the new parameters; those in blue differ from their counterparts in §7.1. 


e:= 1/100 6 

re := 2°( r2 ) • ln(A)°W p 

a := 1/2 — 3 e c 

C := 0(r 2 + r lnln A) T 


= A -l/4+2e 

s 

■= \L\ ■= — 

1 1 K 

_ A -3/4-5e 

P * 

:= rep 

= 1/4 

b 

:= a — c — e 

=-In A 

K 

(b/4)eln A 

a ■ 0 

C 
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9.3 The Analysis 


Upon closer inspection, the proofs of invariants (InvP), (InvD) and (InvH) go through verbatim, 
since they do not use any properties of the modifiers being used. Only the analysis of the invari¬ 
ant (InvE) bounding the energy needs to be changed. In particular, Claim 8.9 no longer holds, and 
we must use the slightly weaker claim below (we will defer the proof to the end of the section). 

Claim 9.1 For any v, E[£ G (u,p(J] = E„ g g E [C(« t>,i/ 0 )] < (1 + (c + e)d) Cc(v,Pa)- 


Combining Claims 9.1 and 8.10, 

Y E[£(w,Po)] < (1 - (a-c-e)6) Y £( uv iPa) = i 1 ~ 99) Y €(. uv iPa), 

u€lG' u£G u£G 

since we redefined b to be a— c— e. This is the analog of (37); we now need to show the concentration. 
And indeed, the argument in §8.4.1 is almost independent of the modifiers, except for the use of 
Claim 8.9 in the proof of Claim 8.12; we can now use Claim 9.1 instead and get identical results up 
to changes in the constants (which are absorbed in the whp A claims). This proves that invariant 
(InvE) also holds. The rest of the analysis is unchanged for the new modifiers. Finally, the 
number of colors used is O(^) = 0(^^) again; plugging in the value of C calculated above gives 
Theorem 6.2. 


It only remains to prove Claim 9.1, which we do next, and to give the construction of the new 
modifers, which appears in the next section §9.4. 

Proof of Claim 9.1: We begin as in the proof of Claim 8.9. Using Fact 8.11, we can replace the 
stopped product with unstopped products, i.e. for any nodes u,v € G, and color 7 € L 


E [PaO, 7 )p'a(v, 7)] < Pa(u, l)p a {v, y)E 
Using independence of the random variables, 


n M <u> n m < 


(v,7) 


E 


nyu IIa” 


’,7) 


n 


E 




M, 


(“,7) 


n 


E 


W^V'.WrfjU 


M, 


0,7) 


n 


E 


ugA 


M w M w 
M (u,7) M (v,7) 


(47) 


The expectations in the first two products are 1, so focus on the expectations in the last product: 


E 


MF.M, 


0,7) 0,7) 


= E 


M ( E 7 ) M ( - i7) \w<£% • Pr[u> £ %] + E 


M {u,7) M {v,7) \ W ^ T v 


Pr[u> € %] 


When w (j T v , then all modifiers 7 j}„ have value 1 . On the other hand, if w € T v , then either 

%U) = aL ( w ' p - half ) and M (y,7) M (u,7) = °> or else ^(tu.7) = 0 ( also W 'P- half ) and M (™, 7 ) M (™ >7 ) = 
4M^ 7 )M^ 7 ). Moreover, Pr[u> G T v \ = 9p c (w, 7 ) < 9p a (w, 7 ). Plugging this into the expression 
above, 


E 


M K7) M 0,7) 


= 1 • (1 - 9p c (w, 7 )) + i • E 


4MP x 

0,7) 0,7) 


■ 9p c (w, 7 ) 


< 1 + 2E 


M w M w 

M (v,7) M {u,7) 


• 9p a (w, 7 ) 
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Taking the product over all w that are common neighbors of u, v, 


n 

w.uvwG A 


(i + 2 e [ m ^ 7 ) m ^ i7) ] -epa^r 

= 1+ ]T 2@p a (n;, 7 )-E[M^ 7) M^ 7) ]+o(A 

w.uvwG/S. 

<(l+e9)+ J2 2(9p a (u;,7)-E[Mf Ui 7 ) Mf Wi7) ] 

w:uv iuEA 


— 2 —e\ 


In the second expression, the second-order terms in the product expansion are bounded by (9 ■ p ■ 
A e ) 2 € o( A _2_e ), since p a (v, 7 ) < p and MJ u ^ < k < A e (by the comments at the end of §9.1). 
The subsequent inequality uses that o(A~ 2_e ) < s9 for large enough A. 

Now summing over all colors 7 and over all u ~ v, we get that for vertex v, 


EEC(«u»Po)] < EE Pa{u,^)p a {v^) 


Ur 7 


M ^) M 7ua) 


(l + £0) + 29p a (w, 1 )¥. 

w.uvw^A 

< (1 +£0)£g(v,Pcl) + EE E 2 6p a (u, i)p a {v, 7 ) Pa(w, 7 ) E[M^ i 7 ) M^ j7) ] 

7 u~v w:uvw€ A 


And now summing over all ~ v instead of only w : uvw € A, and interchanging the summations 


< (1 + e0)£ G (v,p o ) + EE 0 Pa{w,l) ( X^“(A 7 )Pa(u, 7 ) • 2E [ M K7) M b>,7)] 

7 \u~v 

Applying Theorem 9.2(P1) on the inner summands, 

< (1 +£9)^ G (v,p a ) + EE Opa(w, 7) • Cp a (v, 7) 

7 

< (1 + (c + e)6»)^ G (u,p a ) 

This completes the proof of Claim 9.1. ■ 

9.4 Constructing a A" r -free Graph Modifier 

Theorem 9.2 Given an integer t <r, a Kt-free graph H, a constant c < 1 called the “contraction” 
parameter, values p : V(H) —>• [0,1], we can construct modifiers {M(v)} v ^y which are m.o. r.v.s 
that satisfy the following properties: 

(PI) For every vertex v € V(H), Yl u ~ v ^p( u )p( v ) E[M(u)M(u)] < c-p(v). 

(P2) The maximum value of any M(v) < ^ 8p ^ ) ' 2 

(P3) E[M(u) logM(u)] = 0(t 2 + t \og(p(V)/c)). 

Moreover, this construction works in time poly(|R(14)|) • r. 

Remark: The graph H should be viewed as the neighborhood of some vertex w. 
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Proof: The proof is via induction. The base case is when t = 2; the graph H is K 2 -free (i.e., it 
has no edges), then we return M 2 (v) = 1 for all v. This “trivial modifier” satisfies the properties 
above. Else t > 3; in this case we recursively build the modifer. Let V = V(H) be the vertex set 
of the graph. 


1. Define q(v) = p(v)/c. Sample a set X C V of vertices using a dependent sampling technique 
of Gandhi et al. [GKPS06] satisfying the following properties: 

« \X\e{[q(V)\,\q(Vm, 

(ii) Pr[u € X] = q(v), and 
(hi) Pr [N{v) <£ X] < n^(f - ?(«))• 

Assumption 9.3 implies that q(v) < 1 for all v, so the step is well-defined. 

2. Let m := |AT| and let X = x\, X 2 , ■ ■ ■ x m . Define Vo := V — N(X) and let V) := N(xi ) — 
(U*TqV)). This partitions the vertex set V into m + 1 sets. For i € {1,... ,m} the induced 
graphs H[Vi] are Kt- i-free; however, H[Vo] might still contain a K t - 1 . 

3. Let A := {i \ p(Vi ) > c} and B := [1... m] \ A. Let a be an r.v. with Pr[a = z] = Wi, where 


Wi = 


'3 

4 


( § X ieA PXi) 

1 

{8\B\ 


if i = 0 
if i € A 
if i € B 


4. For each i £ {1,... ,m}, recursively construct a modifier M- on the induced Kt- i-free graph 
Hi := H[Vi] using contraction parameter d = \c and values p'{v) = p(v)/wi for all v € P). 
(Assumption 9.3(h) implies that the new values p' satisfy the requirements of the Theorem.) 
Define Mq to be the trivial modifier assigning value 1 to all v € Vo- Define 

m ^ 

M (v) = J2 ■ l(vGVi) • 

i =0 Wi 

In other words, the modifier picks a, defines M(v ) = 0 for all v € V \ V a , and returns a 
scaled-up version of the recursively constructed modifier for V a . 


In order to ensure the algorithm is well-defined, we need some assumptions 


Assumption 9.3 The construction above satisfies: 

(i) q(y ) = p(y)/c < 1 for all v € V. 

(ii) p'(v) = p(v)/wi < 1 for all v € V. 


We now prove that M satisfies properties (P1)-(P3). 


31 




9.4.1 Satisfying property (PI) 

We need to show that for v € V, we have 

Y^2p{u)p{v)E[M(u)M(v)] <c-p{y). 

U~V 

Note that the expectation is over the random choice of X, the choice of a, and the internal 
randomness of the modifiers (denoted as IR). 


Y 2p(u)p(v) E x, a ,!R [M ( u)M (u)] (48) 

IL~V 

= J2 2 p(u)p(v) E x,a,iR [M{u)M[v) I v € N(X)] Pr[u € N(X)] 

U~V 

+ Y 2p(u)p(y)E WR [M(u)M(v) | v i N(X)} Pr[v £ N(X)} 

U~V 

< Y 2 p(u)p(v) E x, a ,m [M ( u)M(v) I v € N(X)] 

U~V 

+ Y 2 P{u)p(v) E wr [M(u)M(y) \ v £ N(X)] Pr[u £ N(X)] (49) 

U~V 


Let us concentrate on the first summand in (49), and condition on some X such that v € N(X) m , 
Y 2p{u)p(v)E aJR [M[u)M{v) | v € N(X),X] 

' m 1 1 

Y 1 (a=i) ■ 1 (ueVi)— ' M [{ u ) ' 1 (a=i) ' ' M i( v ) I v € N (X),X 


Y 2p(u)p(v)E a)IR 


, 1=1 


Since the internal randomness for the modifiers at the next level and a are independent, we get 


Y 2p{u)p(y) Y Eq [!(«=*) I v <E N(X),X] E IR 


1=1 


VJi 


M'(u)- — -M'(v)\veN(X),X 


Wi 


Rearranging the sum, and noting that E a [l( a=i ) | X] = Pr[a = i \ X\ = Wi , we get 


Y 2 


p(u) p(v) 


Wi Wi 


E/Ji [M'(u)M'(v)\ 


i= 1 u~v,u£Vi 
m 

= Y Wi ^2 2 p\ u )p'(. v ) E iR[ M i( u ) M i( v )] 

i= 1 u~v,uEVi 

Applying the induction hypothesis on Lf[Vi] with values p ', 


< Y w i c 'p '( v ) ^ y( v ) 

1=1 
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We now turn to the second summand in (49). In particular, consider the expectation 

E x,a,iR [M(u)M(v) I v $ N(X)] 

= Pr[a = 0] • E x ,iR [M(u)M(v) \ v £ N(X), a = 0] 

+ Pr[a + 0] • E x ,a,iR \M(u)M(y) \ v £ N(X), a ± 0] 

If we ensure that the value wq is chosen independently of X, we get that {a = 0} is independent of 
X,IR. Moreover, since v N(X), it lies in Vo. By construction, M(u)M(y ) will be non-zero only 
if u also lies in Vo, which causes the second summand above to disappear, and give 


= Pr[a 
= Pr[a 

= Pr[a 

< Pr[a 


0] • E x ,iR [M(u)M{v) | v i N{X)\ 

0]E y [—1(« i N(X)) — l(v $ N(X)) | v <£ N(X)] 

Wo Wo 

0] \ Pr[u i N(X) | v $ N(X)\ 

< x 


This shows that the second summand of (49) is upper bounded by 

^p{u)p{v) — Pr[u £ N(X)} < ^-(^2p(u) \ f JJ(1 - q(u)) 


w 0 


w 0 


<M. C . 

Wo 




exp 




p(v) 1 c , . 

< —- ■ c- - < -p{y). 
wq e 2 


In the first inequality we used the negative correlation property of the dependent sampling scheme 
of Gandhi et al., in the second we used the definition of q(u) = p(u)/c, in the third inequality 
we used that xexp(— x) < ^ for all x. The final inequality uses wo = 3/4 > 2/e. Hence the two 
summands sum up to at most cp(v), proving Property (PI). 


9.4.2 Satisfying Properties (P2) 


Claim 9.4 The new values satisfy the following: 

(i) Wi > for all i € { 1 ,... ,m}. 

(n) p'(Vi ) < 8 p(V). 

Proof: For (i), for i G A, Wi = g Y ^f^p(v-) — 8piv) by the definition of A. Moreover, for i € B, 
Wi > gpy > Sp C ( V ) ■ (Here we ignore the issues caused by |X| being an integer adjacent to p(V) 
rather than being equal to it.) Note that (i) does not make any claims about w o- 

For (ii), p'{Vo) = §p(Vo) < 8 p{V). For i G A, 

p’iVi ) = ^ = 8 < 8 p(V). 

Wi ' 

6 A 
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For i € B, p'(Vi) = < 8 p{V) by part (i). ■ 

To prove property (P2), by the IH the maximum value of the recursively constructed modifer Mj 
is 


( 8 -^>r . leCi’) < (tip. V"’. 16 t-3 • 16C3‘) = ( 8 -f))” . 16('3 2 >, 

using the definition of d and Claim 9.4(ii). If we consider i € {1,... , m}, then scaling up by 1/wi 
causes the maximum value to be at most 


J_ x ( 8-p(Y) 
m V c 


t -3 


• 16( 


t-2\ 
2 / 


< 


8 -p(V) 


t—2 


• 16(* 


by Claim 9.4(i). If we consider i = 0, then using that Mq = 1, the maximum value is 4/3, which is 
only smaller (since t > 3). 

For property (P3), observe that E[MlogM] < E [M] - log M max , where M max is the maximum value 
M takes. But if M(v) is a m.o. r.v. then E[M(u)] = 1, so K[M(v) log M(v)] is bounded by the 
logarithm of the expression in property (P2). ■ 


9.4.3 Satisfying the Assumptions 


We still have to address the issue of the validity of the assumptions in Assumption 9.3. We assume 
we start off with a A r -free graph with p(V) < 0(log A), and a contraction parameter c = j (say). 
Let p t (v) be the probability values at some stage where the current vertex set is V f (which is 
Kt~ free), then by Claim 9.4 and algebra. 

(a) ^(F 4 ) < 8 r ~ t p(V), and 

(b) for all v € V 1 , p f (v) < p(v) ■ p ^ 2 ^ . 

Consequently, p t (v) < p(v) ■ (p(V) r ■ 16' “), and if we start off with p(v) < p * and r <C s/log A, 
we ensure assumption (ii) that p l (v) < 1 for all stages t. Assumption (i) demands q t (v ) = p ^ = 
0(p t (v ) • 2 f ) < 1 which is satisfied again by the same conditions. 


A Probabilistic Tools and Useful Lemmas 

A.l Concentration Bounds 

The following large-deviation bound is standard, see, e.g., [AS92], 

Theorem A.l (A Large Deviation Bound) For independent [0 ,m]-bounded random variables 
X\,X 2 ,..., with X := A* having mean E[A] < p, given any A > 0, 

Pr[l X - E WI £A ] £ 2exp{-^L_}. 

In particular, this probability is at most 1/poly (A) when A = 0(^/pm In A + min A). 
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For a multilinear polynomial f(x) = f(xi,X 2 , ■ ■ ■, x n ) with nonnegative coeffecients and degree < q, 
and n independent random variables Y = (Y\. Y 2 ,..., Y n ), we define p r (for every r < q) as follows 

( d r f \ 


Hr = max 


iSC[n],|,S|=r \dx Sl dx S2 . . . dx s 
S={si,...,s r } 


v, v' n 

Building on a long line of work, the following bound is presented by Schudy and Sviridenko [SS12]. 


Theorem A.2 (Large Deviation for Polynomials) Consider independent [0 ,m]-bounded r.v.s 
X U X 2 ,... and let X := (X\, X 2 , ■ ■ ■, X n ). Let f(x) = f(x\, x 2 , ■ ■ ■, x n ) be a multilinear polynomial 
of degree q with non-negative coefficients, and let f(X) have moment parameters p 0 , ..., p q . 
There exists a universal constant C = C{q) such that 


Pr[|/(X)-E[/(X)]|>A]<e 2 max 


max 

r=l, 



A2 1 

C • m r • no fi r J 


max exp 

r=l,...,q 


A 


C ■ m r ■ pL r 


177- 


Proof: Use Theorem 1.2 of the Schudy-Sviridenko paper [SS12] and the observation that any 
[0, m]-bounded r.v. is moment bounded by parameter L = m. ■ 

Corollary A.3 Consider independent [0,l]-bounded r.v.s X\, X 2 ,... and let X := (X\, X 2 ,..., X n ). 
Let f(x) = f(xi,x 2l ... ,x n ) be a multilinear polynomial of degree 2 with non-negative coefficients, 
and let f(X) have mean E [f(X)] < fa and moment parameters ni,n 2 < 0(1). Then 

Pr 0/W ~ E if(. x )}\ > A] < e 2 max |exp |-^-y| , exp |-0(A) 1/2 } | . 

In particular, this probability is at most 1/poly(A) when A = 0(\/ pAn A + In 2 A). 

A.2 The Lovasz Local Lemma 

The following theorem essentially follows from Moser and Tardos [MT10]. 

Theorem A.4 Consider a set of n independent random variables J- = (A,}" =1 , and assume that 
sampling each r.v. from the underlying distribution can be done in constant time. Given a collection 
of m subsets {Sj C such that the “bad” event Bj is completely determined by the r.v.s in 

subset Sj, define the degree dj = \{j' € [m] \ Sj n Sj> / 0}|. Define pj := Pr [Bj\. Suppose 

(maxpj) ■ (max dj) <1/4 
3 j 

then there is an algorithm running in time poly(m, n ) to find a setting of the random variables X j 
such that none of the bad events occur. 

A.3 Auxiliary Lemmas 

Lemma A.5 Suppose h(v,p) > (1 — 5) In A and p(V v ) € (1 ± v), then p a (V v ) > 1 — 6(5 + v). If 
£g(v,p) < 2 K also holds, then p c (V v ) > 1 — 6(5 + v) — 2s. 
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Proof: First, we prove the bound on p a (V v ). Recall that p a (v, 7) = p(v, 7) • l( P (u, 7 )<p)- Since any 
non-zero probability is at least 1 /s > 1/A, the entropy 

h(v,p) =-Y^p(v,7) ln P(v,7) < f Y P( v >'y)') In A + f Y p(v, 7)^ In 1/p (50) 

7 V 7:pR,7)S(l/s,3 V 7:p(u,7)>p 

Let B := X^ 7 :p(ti, 7 )>p p(. v i t)! since pCP„) € (1 ± z/), we have that 

Pa(V v )= Y P( v i 7)= X] p(u,7) € (1 - 1 /). 

7:p(v,7)e[0,p] 7 : P(v,7)e(l/s,p] 

Moreover, In 1/p = (| + 5e) In A. Finally, by assumption, h(y,p) > (1 — 5) In A. Substituting 
into (50) and dividing throughout by In A, and using e = 1/100, we get 


(1 — -B + iz)+(- + 5e)S>l — 5 


B < 5(5 + v). 


(51) 


Hence p a {V v ) € [1 — 6(5 + v), 1 + v\, which proves the hrst part of the claim. 

Next, the bound onp c (V v ). Recall that p c (v, 7) =p a (v, 7) • 1 (^ P (u, 7 )<iooin A)- Since e = 1/100, 
the threshold for zeroing out is h/4 > A. Consequently, if S v := {7 | 'Yhu~ v P( u ->l) — T"}> then 
p c (V v ) > p a (V v ) — J2'y£S v P( v ’'Y')- To bound the latter sum, observe that 


Y p( v >7) < 4 Y P^i^Yp^^ - -^^2&(uv,p) = ^ g(v,P ) < 2e. 


K „ 

7£S„ 7SS1, 

Hence p c ( V v ) > 1 — 6(5 + ^) — 2e 
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